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1.  Introduction 


The  complexity  of  the  Army  mission  often  impels  the  scientist  toward  creative  problem-solving. 
Principled  approaches  to  Anny  challenges  will  draw  pertinent  features  from  several  related 
disciplines.  Missions  are  unique.  So,  techniques  considered  are  usually  as  yet  unimplemented  in 
academic  and  industrial  research  and  development  (R&D)  settings. 

Nevertheless,  with  testing  and  enhancement  in  mission-oriented  labs  such  as  the  U.S.  Army 
Research  Laboratory  (ARL),  customized  solutions  result  in  effectiveness  breakthroughs. 
Moreover,  when  solutions  are  based  on  sound  and  relevant  principles  and  disciplines,  they  also 
carry  the  potential  for  Anny  reuse  to  infonn  a  general  framework. 

The  technique  outlined  here,  for  selecting  the  technical  terms  to  include  in  glossaries  designed  to 
aid  human  linguists  with  foreign  language  word  and  phrase  look-up  and  enhance  automatic 
processes,  such  as  machine  translation  (MT),  was  developed  in  support  of  Combined  Joint  Inter- 
Agency  Task  Force  435  (CJIATF-435)  in  Afghanistan. 

As  detailed  below,  ARL  had  taken  responsibility  for  a  coordinated  solution  to  address  the  need 
for  rapid  and  high  quality  Afghan  language  translation  with  a  proposal  for  integrated  materiel 
development,  which  could  subsequently  be  leveraged  by  an  existing  Army  machine  foreign 
language  translation  acquisition  program  such  as  the  Machine  Foreign  Language  Translation 
System  (MFLTS),  the  Army’s  Program  of  Record  for  MT. 

The  work  of  a  Joint  Task  Force  (JTF)  is,  by  definition,  a  team  perfonnance  by  individuals  with 
varying  expertise,  perspectives,  and  skills,  toiling  together  toward  common  goals.  While  the 
present  method  underpins  a  capability  that  serves  only  one  specific  group,  the  foundation  of  the 
method  explained  here  justifies  its  use  to  inform  glossary  building  and  MT  tasks  for  similar  JTFs 
operating  at  various  strategic  locations. 


2.  Motivation  and  Background 


We  were  doubly  motivated  in  undertaking  this  particular  task.  Our  primary  interest  was  in 
meeting  an  immediate  mission  need  and,  in  fulfilling  that  requirement,  we  also  wanted  to  exploit 
a  fundamental  principle  of  linguistic  cognition,  that  is,  that  language  use  induces  expectations. 

2.1  Mission  Requirements 

At  the  request  of  the  Office  of  Director  Defense  Research  and  Engineering  (DDR&E),  the  Army 
Director  of  Capabilities  Integration,  Prioritization,  and  Analysis  drew  up  an  Execution  Plan  to 
respond  to  the  Joint  Urgent  Operational  Need  of  Improved  Machine  Based  Language  Translation 
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for  Afghan  Languages  (JUON  CC-0429).  Section  Four  of  that  plan  states  that  “[ARL]  will  build 
a  comprehensive  glossary  of  organization  names,  acronyms,  and  technical  terms  from  the  legal 
and  criminal  justice  domains  with  a  target  size  of  5,000  words,  [. . .]  will  compile  this  new  Task 
Force  glossary  as  an  electronic  file  using  acronyms  and  other  items  found  on  the  HarmonieWeb 
portal  at  the  Rule  of  Law  site,  [and]  will  elicit  glossary  items  from  points  of  contact  at  the 
CJIATF  headquarters  [. . .]  and  other  staff  sections.”  It  goes  on  to  indicate  that  English-Dari  as  a 
language  pair  will  be  given  priority  over  English-Pashto  and  that  the  glossaries  will  be 
refonnatted  for  dual  use  as  user-specific  dictionaries  in  MT  software. 

2.2  Expectation  Grammar  Theory 

It  goes  without  saying  that  one  wants  the  words  in  one’s  language  technology  glossary  to  include 
those  domain  terms  that  occur  in  the  material  that  supports  the  bilingual  work.  Thus,  physicians 
want  to  see  medical  terms,  attorneys  want  to  see  legal  terms,  and  Soldiers  want  to  see  military 
tenns.  This  is  the  assumption  that,  reasonably,  underpins  the  concept  of  user-dictionary  as  a 
feature  of  the  linguists’  automated  look-up  tools. 

Less  frequently  noted  is  the  logic  behind  the  incorporation  of  these  valuable  handcrafted 
resources — characterized  by  extremely  precise  renderings — into  tools  for  the  automatic 
processing  of  semantic  equivalence  of  text  in  two  or  more  languages,  as  in,  for  example, 
automatic  translators  and  multilingual  summarizers.  Since  language  is  a  system  of  signs,  or 
sound-meaning  duals,  agreed  upon  by  a  community,  it  is  a  human  phenomenon  that  develops  not 
only  individually  but  also,  necessarily,  at  group  level.  In  fact,  throughout  life,  our  idiolect,  or 
individual  system  of  linguistic  choices,  becomes  an  important  means  by  which  we  express  our 
identity  with  one  or  more  groups. 

Viewed  another  way,  the  language  use  of  the  group  or  subgroup  can  constitute  a  sublanguage, 
often  referred  to  as  jargon  for  a  professional  group  or  argot  for  less  well-defined  groups. 
Meanings  associated  with  jargon  terms  are  tied  to  a  specific  concept,  typical  activity,  or 
prevalent  attitude  displayed  in  that  community.  What  makes  them  valuable  for  the  purposes  of 
automatic  language  processing,  especially  as  embodied  in  finely  tuned  glossaries,  is  that  their 
semantic  structure  is  devoid  of  ambiguity. 

For  readers  of  MT  output  who  are  familiar  with  the  jargon  of  the  community  served  by  the 
incorporated  glossary,  encountering  and  understanding  a  well-translated  term  with  a  specific 
sense,  or  a  well-rendered  name  with  a  specific  referent,  is  akin  to  the  second  language  learners’ 
experience  of  encountering,  in  a  challenging  second  language  text,  a  word  or  phrase  that  they 
understand  the  meaning  of  and  actually  realize  that  they  understand  it.  In  second  language 
learning,  this  is  known  as  “comprehensible  input”  (Krashen,  1981). 

This  “input”  to  the  MT  reader’s  cognitive  process  is  helpful  in  two  ways.  First,  it  gives  an 
unambiguous  sense  to  the  segment  translated  and  thereby  increases  the  reader’s  confidence  in  the 
fidelity  of  the  automatic  rendering  as  a  whole.  Second,  it  triggers  world  or  “professional  world” 
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knowledge,  which  is  related  to  the  irrevocably  understood  concept.  The  freshly  triggered 
knowledge  permits  the  reader  to  interpret  the  text  with  a  higher  probability  of  accuracy  than 
would  otherwise  be  possible.  According  to  one  theorist,  with  confidence,  understanding  and 
related  knowledge,  readers  generate  “grammar-based  expectancies”  or  hypotheses  about  event 
sequences  analogous  to  the  plans  of  the  author,  in  the  MT  condition,  of  the  source  language  text 
(Oiler,  1983). 

The  congruent  expectancies  then  increase  the  likelihood  of  accurate  understanding.  This  idea 
was  supported  in  an  ARL  pilot  study  of  human  acceptability  judgments  on  MT  output. 
Acceptability  judgments  were  compared  on  two  versions  of  output,  one  in  which  names  were 
accurately  rendered  (A  Set)  and  another  in  which  names  were  inaccurately  rendered  (I  Set). 

Using  a  Magnitude  Estimation  (ME)  methodology  in  which  subjects  made  a  direct  numerical 
estimation  of  the  degree  to  which  sentences  in  the  data  conveyed  the  meaning  in  the  reference 
sentences,  investigators  found  a  34.8%  difference.  There  was  22%  difference  between  A  Set  and 
I  Set  scores,  using  automatic  evaluation  in  Meteor  (Lavie  and  Agarwal,  2007).*  A  differential 
effect  was  thus  detected,  suggesting  that  weighting  proper  name  rendering  in  automated 
evaluation  systems  may  improve  the  reliability  of  these  systems. 


3.  Methodology 


A  glossary  prepared  for  one  document  in  particular,  the  first  text  in  Stanford  Law  School’s 
Afghan  Legal  Education  Project  (ALEP),  An  Introduction  to  the  Law  of  Afghanistan,  will  serve 
to  illustrate  the  method  employed  throughout  the  project  for  glossary  development.  This 
document  and  a  human  translation  of  it  are  freely  available  online  (Stanford,  2011). 

3.1  Automation 

The  first  step  is  an  automatic  process  for  culling  frequently  occurring  content  words  from  a  text. 
This  step  is  necessary  when  a  text  is  particularly  lengthy.  The  ALEP  document  contained  234 
pages,  so,  a  human  effort  alone  would  have  been  prohibitively  time-consuming.  Instead,  we 
identified  two  publicly  available  terminology  extractor  tools:  TerMine  (NaCTEM,  2011)  and 
Alchemy  (Alchemy API,  2011;  Rose,  2011). 

TerMine  evaluates  a  candidate  term  based  on  four  corpus  statistical  characteristics  related  to  the 
tenn:  its  length,  its  occurrence  frequency,  its  frequency  as  part  of  other  longer  candidate  terms, 
and  the  number  of  these  longer  candidate  terms.  The  formula  that  determines  term  hood  and  is 
incorporated  into  the  algorithm  is  called  C-value.  This  measure  accounts  for  nested  tenns  by 


* 

Meteor  is  an  automatic  metric  for  MT  evaluation,  which  has  demonstrated  high  correlation  with  human  judgments  of 
translation  quality,  significantly  outperforming  the  more  commonly  used  Bleu  metric. 
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recognizing  term  context  words  and  then  incorporating  infonnation  from  the  context  words  into 
the  term  extraction  process  (Frantzi,  Ananiadou,  and  Mima,  2000). 

The  AlchemyAPI  Web  site  provided  two  tools,  an  Interactive  Demonstration  and  a 
Keyword/Terminology  Extractor.  Output  from  the  former,  which  constituted  a  subset  of  the 
latter,  was  marked  by  high  precision,  and  that  from  the  latter,  by  high  recall.  The  Alchemy 
approach  contrasts  with  that  used  in  TerMine  in  that  Alchemy  will  process  the  text  with 
information  categories,  such  as  person,  location,  and  organization,  in  addition  to  returning  topic 
keywords.  Output  from  both  TerMine  and  Alchemy  Keyword/Terminology  Extractor  were 
submitted  for  human-in-the-loop  selection. 

3.2  Selection  and  Context  of  Use 

The  criteria  used  in  selecting  tenns  for  this  project  follow  conceptual  constructs  in  the  corpus 
linguistics  research  literature,  especially  “context  of  use”  (Biber,  Conrad,  and  Reppen,  1998). 
According  to  this  principle,  a  word  or  expression  can  have  a  unique  meaning  within  a  given 
community  or  situational  setting.  The  word  or  multi-word  expression  can  also  be  associated 
with  that  context  without  any  change  in  general  meaning. 

When  settings  and  groups  determine  an  agreed-upon  sense,  an  expression  may  occur  outside  a 
given  context  of  use,  but  with  a  different  meaning.  For  example,  “lower  house”  and  “upper 
house”  in  a  “governing”  context  refer  to  legislative  assemblies.  The  exact  same  phrase, 
however,  in  a  “geographic  location”  context,  refer  to  the  placement  of  residences.  This  is  not  to 
say  that  the  sense,  let’s  call  it  SI,  in  the  first  context,  which  we’ll  call  Cl,  can  never  occur  in  the 
second  context,  C2,  and  vice  versa.  It  only  indicates  that  S 1  exists  and  is  distinct  from  the  sense 
it  has  in  the  second  context,  S2. 

When  a  sense,  S 1 ,  associated  with  a  context,  C 1 ,  does  occur  in  a  well-defined  separate  context, 
C2,  there  may  still  be  subtle  changes  along  different  semantic  lines  of,  for  example,  register  or 
emphasis.  Full  names  are  examples  of  expressions  in  a  fonnal  register  that  refer  to  a  single 
person,  SI,  and  are  usually  reserved  for  fonnal  occasions  and  documents,  Cl,  such  as 
ceremonies  and  forms.  But  in  informal  settings,  C2,  such  as  family  gatherings,  full  names,  while 
maintaining  their  S 1  reference,  may  merely  be  used  for  emphasis  with  the  referent,  the  person 
being  refened  to,  remaining  unchanged. 

What  is  important  about  the  “sense  by  association”  aspect  of  the  “context  of  use”  principle  is  that 
a  word  or  expression  can  also  be  highly  correlated  with  a  group  or  context  while  maintaining  the 
same  meaning  both  inside  and  outside  of  the  context.  The  frequency  and  typicality  of  occurrence 
in  one  given  context  of  use  is  what  gives  the  word  or  expression  its  unique  meaning  and  its 
membership  in  a  semantic  category  associated  with  that  context.!  Consider  the  case  of  the 
conjunction,  “notwithstanding,”  a  word  associated  with  legal  and  other  formal  written  contexts. 


+it 


s  occurrence  at  the  syntactic  level  is  beyond  the  scope  of  this  report. 
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Its  sense  as  a  marker  of  discourse  function  is  not  lost  in  other,  non-legal,  contexts,  only  less 
frequent  and  typical.  As  a  result,  “notwithstanding”  might  be  considered  part  of  the  legal 
lexicon. 

With  this  in  mind,  we  populated  the  glossary  for  this  project  with  words  and  expressions  having 
a  unique  meaning  or  high  frequency  of  occurrence  in  the  specific  context  of  use  of  nation¬ 
building  in  Afghanistan,  as  judged  by  examination  and  semantic  analysis  of  the  CJIATF-435 
material. 

The  CJIATF-435  is  tasked  with  setting  standards  of  behavior  for  detention  facilities,  defining 
elements  of  parliamentary  structure,  reporting  on  police  actions,  and  providing  lessons  for 
training,  background  for  leadership  development,  and  information  for  and  about  other  initiatives. 
They  thus  rely  on  text  types,  such  as  press  reports,  presentations,  handbooks,  and  instruction 
manuals,  among  other  material.  There  is  no  one  domain  or  one  genre  that  adequately  captures 
the  linguistic  variety  that  the  resources  under  construction  will  be  designed  to  handle. 

Because  the  notion  of  context-of-use  transcends  the  traditional  concepts  of  domain  and  genre,  it 
is  a  useful  rubric  for  deciding  which  lexical  items  logically  to  include  in  the  JUONS  glossaries, 
customizers,  and  bilingual  training  datasets.  Single  lexical  items,  as  well  as  multi-word  noun- 
based  and  verb-based  expressions  will  be  found  in  the  lists.  Technical  terminology,  as  a 
category  of  reference  within  communities  of  common  interest  (CCI),  is  a  set  of  words  whose 
context  of  use  is  kept  constant.  For  example,  among  the  medical  community  in  medical  settings, 
one  hears  the  terms,  coronary  infarction,  arterial  sclerosis,  edema,  angina,  etc.  As  a  category 
then,  technical  tenns  in  a  CCI  function  in  a  manner  similar  to  named  entities  in  a  CCI  consisting 
of  speakers  acquainted  with  the  named  entity.  That  is,  for  each  tenn,  CCI  and  context-of-  use, 
there  is  only  one  sense  and,  for  each  named-entity,  CCI  and  context-of-use,  there  is  only  one 
referent.  1 

Technical  terms  are  generally  included  in  the  glossaries.  They  may  also  be  embedded  in  context- 
of-use  expressions.  In  these  cases,  the  term  is  extracted  to  stand  alone  as  a  single  lexical  item. 
The  rest  of  the  expression  is  then  reevaluated  according  to  the  criteria  described  earlier.  As  for 
the  fonns  included,  we  limited  ourselves,  for  this  first  pass,  to  the  forms  that  occurred  in  the 
material,  leaving  the  questions  of  which  ideally  to  include  or  how  ideally  to  process  the  forms  for 
future  work. 


1  Again,  the  issue  of  ambiguous  references,  to  include  those  to  entity  referents  with  the  same  name  or  one  entity  with  two 
names — open  research  questions  in  their  own  right — is  beyond  the  scope  of  this  report. 


5 


3.3  Translation 

The  next  step  in  the  process  of  lexical  development  to  support  the  building  of  mission-specific 
statistical  machine  translation  (SMT)  systems  and  glossaries  is  the  translation  of  the  selected 
tenninology.  Once  a  final  list  of  terms  is  established,  the  developer  inputs  the  selections  to  the 
latest  SMT  system-in-progress  to  produce  a  list  of  translated  terms.  § 

For  all  the  reasons  that  the  SMT  is  still  incomplete,  that  is,  faulty  alignments,  out-of-domain 
training  data,  and  inconsistent  segmentation  and  spelling,  among  others,  the  list  of  translations  in 
the  output  is  sparse  and  error-laden.  However,  the  ratio  of  the  number  of  valid  or  fairly  close 
renderings  to  the  number  of  decidedly  unhelpful  ones  is  generally  high  enough  to  justify  the 
effort  in  automation.  The  bilingual  list  output  assists  the  project’s  native  speaker  linguists,  or 
“humans-in-the-loop,”  by  saving  them  time  and  tedium  searching  for  and  consulting  about 
appropriate  tenninological  translations.  In  this  way,  the  selection  step  serves  also  to  lighten  the 
burden  on  the  native  speaker  linguist  whose  job  it  is  to  ensure  the  quality  of  the  SMT  support  to 
the  mission  project,  not  unlike  the  pipeline  mechanism  for  preparing  bilingual  corpora  for  the 
linguist’s  review;  see  Tanenbaum,  LaRocca,  and  Morgan  (2011). 


4.  Discussion  and  Future  Work 


Progress  in  the  direction  of  greater  automation  without  a  sacrifice  of  quality  in  this  context  relies 
on  the  vast  linguistic  knowledge  that  can  only  be  supplied  by  the  human  language  specialist, 
subject  matter  expert,  and  linguist.  Thus,  much  of  the  automation  developed  and  used  in  support 
of  mission-focused  MT  development  goes  toward  facilitating  the  work  of  the  human  linguist, 
that  is,  alleviating  repetitive,  tedious,  and  time-consuming  tasks.  For  example,  the  pipeline 
system,  noted  in  section  3,  is  geared  to  harvesting,  cleaning,  aligning,  and  presenting  to  a 
language  specialist  two  semantically  equivalent  texts,  in  different  languages,  segment  by 
segment. 

Without  that  automation,  the  highly  qualified  language  specialists  would  be  obliged  to  spend 
much  of  their  time  cutting- and -pasting  the  texts  from  the  Web  page  and  reformatting  it  to 
eliminate  noise  elements,  as  encountered.  This  means  their  work  would  consist,  for  the  most 
part,  of  deleting  reoccurring  mark-up,  stray  text,  framing,  and  advertisements;  editing 
misspellings,  spacing,  and  fonnatting  errors;  and  inserting  appropriate  text  for  both  halves  of  the 
bilingual  corpus.  Needless  to  say,  pipeline  automation  affords  the  project  a  considerable  savings 
in  terms  of  the  cost  and  mental  fatigue  of  the  linguists/language  specialists,  who,  with  the 


^Morgan  (20 1 1 )  describes  the  project-specific,  human- in-the-loop  SMT  system-building  methodology. 
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pipeline,  can  stay  energized  by  contributing  their  unique  and  sophisticated  linguistic  acumen  to 
the  effort. 

The  same  is  true  of  the  term  selection  process.  Burdening  the  linguist  and  subject  matter  expert 
with  the  task  of  repeatedly  translating  frequently  occurring  terms  and  substantives,  which  are 
errorful  and  light  on  content,  makes  inefficient  and  inappropriate  use  of  their  time  and  talents, 
which,  at  the  end  of  the  day,  is  cost  ineffective.  By  contrast,  what  we  have  presented  here  is  a 
method  for  tenn  selection  that  is  based  on  sound  and  relevant  principles  of  automation  and 
linguistics.  Its  value  lies  in  its  mission  effectiveness,  which  can  only  be  measured  by  putting  it 
into  practice  for  human-in-the-loop  foreign  language  system  and  resource  development.  If,  with 
use,  the  latter  serves  to  increase  Soldier  effectiveness,  then  it  is  our  hope  that  the  method  will 
become  a  standard  and  that  the  concepts  it  embodies  will  inform  a  general  framework  for 
development  of  foreign  language  glossaries  and  MT  resources. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


ALEP 

Afghan  Legal  Education  Project 

ARL 

U.S.  Anny  Research  Laboratory 

CCI 

communities  of  common  interest 

CJIATF-435 

Combined  Joint  Inter-Agency  Task  Force  435 

DDR&E 

Office  of  Director  Defense  Research  and  Engineering 

JTF 

Joint  Task  Force 

ME 

Magnitude  Estimation 

MT 

machine  translation 

R&D 

research  and  development 

MFLTS 

Machine  Foreign  Language  Translation  System 

SMT 

statistical  machine  translation 
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